Mehr ggplot2!

Datensatz

# ## Keep in mind!
# ## Eventuell für die Übung? 
# tuesdata <- tidytuesdayR::tt_load('2025-08-26')
# tuesdata$billboard %>% 
#   filter(song == "Love the Way You Lie") %>% 
#   select(danceability, energy, happiness)
# 
# 
# ## Netflix
# tuesdata <- tidytuesdayR::tt_load('2025-07-29')
# 
# ## Gutenberg project
# tuesdata <- tidytuesdayR::tt_load('2025-06-03')


characters <- tidytuesdayR::tt_load('2022-08-16')

char_dat <- characters$characters
psych_dat <- characters$psych_stats



dat_merged <- char_dat %>% 
  rename(char_name = name) %>% 
  left_join(psych_dat) 


dat_prepped <-  dat_merged %>% 
  filter(uni_name %in% c("How I Met Your Mother", "Friends")) %>% 
  filter(question %in% c("doer/thinker", "jock/nerd", "cold/warm", "main character/side character", "crazy/sane"))

## Cold warm: Geschlecht? Could get ChatGPT to code

## Eventuell inside standardisieren, um spezifische Abweichung zu zeigen? 

Foto von Ilse Orsel auf Unsplash

Charaktereigenschaften in HIMYM

ggplot(data = dat_prepped, 
        mapping = aes(x = question, y = avg_rating, colour = char_name, shape = uni_name)) +
  geom_point()

Wir vergleichen jetzt einige Charaktereigenschaften in HIMYM und Friends. Gerade noch etwas schwierig. Lösung: Faceting

Facetting

Faceting

Anordnen von einer einzelnen Variable in einem Raster:

facet_wrap()

ggplot(data = dat_prepped, 
        mapping = aes(x = question, y = avg_rating, colour = char_name, shape = uni_name)) +
  geom_point() +
  facet_wrap(vars(char_name), nrow = 4) +
  theme_bg()

facet_grid

ggplot(data = dat_prepped, 
        mapping = aes(x = question, y = avg_rating, colour = char_name, shape = uni_name)) +
  geom_point() +
  facet_grid(char_name ~ .) +
  theme_bg()

Facetting - Mehrere Variablen

Anordnen von mehreren Variable in einem Raster:

facet_wrap()

ggplot(data = dat_prepped, 
        mapping = aes(x = question, y = avg_rating, colour = char_name, shape = uni_name)) +
  geom_point() +
  facet_wrap(vars(char_name, uni_name), nrow = 4) +
  theme_bg()

facet_grid()

ggplot(data = dat_prepped, 
        mapping = aes(x = question, y = avg_rating, colour = char_name, shape = uni_name)) +
  geom_point() +
  facet_grid(char_name ~ uni_name) +
  theme_bg()

Facetting - Tipps

Plot alle Punkte

dat_prepped_background <- dat_prepped %>% 
  mutate(char_name_bg = char_name) %>%
  select(-char_name)

ggplot(dat_prepped,  aes(x = question, y = avg_rating, colour = char_name, shape = uni_name)) +
  # background lines: drawn in every facet, grouped by country_bg
  geom_point(
    data = dat_prepped_background,
    aes(x = question, y = avg_rating, group = char_name_bg),
    inherit.aes = FALSE,
    color = "grey70",
    alpha = 0.5,
    size = 0.4
  ) +
  geom_point() + 
  facet_wrap(vars(char_name)) +
  guides(color = "none") +
  theme_bg()

Facetting - Tipps

Plot Mittelwerte

dat_mean <- dat_prepped %>% 
  group_by(question) %>%
  summarise(avg_rating = mean(avg_rating))
  

ggplot(dat_prepped,  aes(x = question, y = avg_rating, colour = char_name, shape = uni_name)) +
  # background lines: drawn in every facet, grouped by country_bg
  geom_point(
    data = dat_mean,
    aes(x = question, y = avg_rating),
    inherit.aes = FALSE,
    color = "grey70",
    size = 1
  ) +
  geom_point() + 
  facet_wrap(vars(char_name)) +
  guides(color = "none") +
  theme_bg()

Standardisierung könnte beim Vergleich zwischen den Fragen helfen - das kommt aber auf die finale Fragestellung an. Ist aber ein Punkt, den man zumindest im Hinterkopf behalten sollte.

Sortieren

Sortieren, läuft in ggplot2 generell über factor(). Manchmal kann es hilfreich sein, sich eine eigene ID-Variable zum Sortieren zu erstellen

dat_prepped$uni_name_fac <- factor(dat_prepped$uni_name, levels = c("How I Met Your Mother", "Friends"))
ggplot(data = dat_prepped, 
        mapping = aes(x = question, y = avg_rating, colour = char_name, shape = uni_name_fac)) +
  geom_point() +
  facet_wrap(vars(uni_name_fac), nrow = 4) +
  theme_bg()

Skalen und Legenden

Skalen

“Scales in ggplot2 control the mapping from data to aesthetics. They take your data and turn it into something that you can see, like size, colour, position or shape.” ggplot2: Elegant Graphics for Data Analysis

Link to aes slide.

Legenden

Legenden werden automatisch erzeugt. Dafür werden die aestetics genutzt, also das mapping von Daten zu grafischen Elementen. Jede Skala bekommt eine Legende zugeordnet.

Legenden und Achsen sind funktional äquivalent und werden in ggplot2 unter dem Begriff guides zusammengefasst. Während Skalen die Daten auf grafische Eigenschaften wie Position oder Farbe abbilden, machen Guides diese Abbildung wieder verständlich: Achsen übersetzen Positionen zurück in Zahlen, Legenden ordnen Farben oder Symbole den entsprechenden Datenwerten zu. Man kann sie daher als die „Umkehrfunktion“ der jeweiligen Scales verstehen.

Jede aesthetic im Plot ist mit genau einer scale verbunden:

Implizite Definition

ggplot(data = dat_prepped, 
        mapping = aes(x = question, y = avg_rating, colour = char_name, shape = uni_name_fac)) +
  geom_point() 

Wird intern zu:

ggplot(data = dat_prepped, 
        mapping = aes(x = question, y = avg_rating, colour = char_name, shape = uni_name_fac)) +
  geom_point() +
  scale_x_discrete() + 
  scale_y_continuous() + 
  scale_colour_discrete() +
  scale_shape_discrete()

  • time und co2_pcacp_cons sind beide kontinuierliche Variablen: scale_x_continuous(), scale_x_continuous()
  • country ist diskret: scale_colour_discrete()

Das können wir uns zunutze machen, um manuell Scales zu definieren.

ggplot(data = dat_prepped, 
        mapping = aes(x = question, y = avg_rating, colour = char_name, shape = uni_name_fac)) +
  geom_point() +
  scale_x_discrete(name = "Eigenschaft") + 
  scale_y_continuous("Mittleres Rating") + 
  scale_colour_discrete("Character") +
  scale_shape_discrete(name = "Serie")

In der Praxis würden wir dafür labs(x = "Jahr", y = "CO2 Verbrauch pro Kopf", color = "Länder", shape = "Serie") nutzen. Wir sehen so aber, dass Achsen- und Legendentitel jeweils Skalennamen sind.

ggplot(data = dat_prepped, 
        mapping = aes(x = question, y = avg_rating, colour = avg_rating, shape = uni_name_fac)) +
  geom_point() +
  scale_y_log10(name = "Mittleren Rating log") +
  scale_colour_continuous()

Eine Übersicht über die möglichen Skalentypen findet sich [hier])https://ggplot2tor.com/scales/).

Anwendungsfälle: Farben

Farben

Oft macht es Sinn, die Farben direkt über einen named vector zu definieren. Dadurch wird jedem Element in der color-variable genau die gewünschte Farbe zugeordnetÖ

ggplot(data = dat_prepped, 
        mapping = aes(x = question, y = avg_rating, colour = char_name, shape = uni_name_fac)) +
  geom_point(size = 3) +
  scale_shape_manual(values = c("Friends" = 12, "How I Met Your Mother" = 18)) +
  scale_colour_manual(values = c("Ted Mosby" = "blue", 
                                   "Robin Scherbatsky" = "red", 
                                   "Barney Stinson" = "green", 
                                   "Lily Aldrin" = "purple", 
                                   "Marshall Eriksen" = "orange", 
                                   "Rachel Green" = "pink", 
                                   "Monica Geller" = "brown", 
                                   "Phoebe Buffay" = "yellow", 
                                   "Joey Tribbiani" = "cyan"))

ggplot(data = dat_prepped, 
        mapping = aes(x = question, y = avg_rating, colour = char_name, shape = uni_name_fac)) +
  geom_point() +
  scale_colour_brewer(palette = "Set3")

:::

Anwendungsfälle: Skalen-Ticks

ggplot(data = dat_prepped, 
        mapping = aes(x = question, y = avg_rating, colour = char_name, shape = uni_name_fac)) +
  geom_point() +
  scale_y_continuous(name = "Mittleres Rating", breaks = seq(0, 100, by = 10), limits = c(0, 100))

Scale Guides

Jede Skala (und damit jede Aesthetic) bekommt einen Guide zugeordnet. Intern passiert das über guides(). Wir können guides() also nutzen, um die Legende zu manipulieren:

ggplot(data = dat_prepped, 
        mapping = aes(x = question, y = avg_rating, colour = char_name, shape = uni_name_fac)) +
  geom_point() +
  guides(color = guide_legend(title = "Charaktere", ncol = 3, reverse = TRUE, override.aes = list(size = 3)))

Mögliche guide-Funktionen

: ::: {.columns}

  • guide_colourbar()
  • guide_coloursteps()
  • guide_axis()
  • guide_legend()
  • guide_bins()
ggplot(data = dat_prepped, 
        mapping = aes(x = question, y = avg_rating, colour = char_name, shape = uni_name_fac)) +
  geom_point() +
  guides(x = guide_axis(angle = 90))

:::

Themes

Da würde man ja auch viel zur Legende ändern?

Koordinatensytem

  • Polar, evtl. characters plot oder weltraumplot als beispiel

Abspeichern

Vektor vs Raster (Rolfs 7)

Use characters data for demonstration or for exercise?